Perceptual coding of audio signals

ABSTRACT

A method is disclosed for determining estimates of the perceived noise masking level of audio signals as a function of frequency. By developing a randomness metric related to the euclidian distance between (i) actual frequency components amplitude and phase for each block of sampled values of the signal and (ii) predicted values for these components based on values in prior blocks, it is possible to form a tonality index which provides more detailed information useful in forming the noise masking function. Application of these techniques is illustrated in a coding and decoding context for audio recording or transmission. The noise spectrum is shaped based on a noise threshold and a tonality measure for each critical frequency-band (bark).

.Iadd.This application is a continuation of application Ser. No.08/106,499, filed on Aug. 13, 1993, abandoned..Iaddend.

FIELD OF THE INVENTION

The present invention relates to coding of time varying signals, such asaudio signals representing voice or music information.

BACKGROUND OF THE INVENTION

Consumer, industrial, studio and laboratory products for storing,processing and communicating high quality audio signals are in greatdemand. For example, so-called compact disc (CD) digital recordings formusic have largely replaced the long-popular phonograph records. Morerecently, digital audio tape (DAT) devices promise further enhancementsand convenience in high quality audio applications. See, for example,Tan and Vermeulen, "Digital audio tape for data storage," IEEE Spectrum,October 1989, pp. 34-38. Recent interest in high-definition television(HDTV) has also spurred consideration of how high quality audio for suchsystems can be efficiently provided.

While commercially available CD and DAT systems employ elaborate parityand error correction codes, no standard presently exists for efficientlycoding source information for high quality audio signals with thesedevices. Tan and Vermeulen, supra, note that (unspecified) datacompression, among other techniques, can be used to increase capacityand transfer rate for DAT devices by a factor of ten over time.

It has long been known that the human auditory response can be masked byaudio-frequency noise or by other-than-desired audio frequency soundsignals. See, B. Scharf, "Critical Bands," Chap. 5 in J. V. Tobias,Foundations of Modern Auditory Theory, Academic Press, New York, 1970.While "critical bands," as noted by Scharf, relate to many analyticaland empirical phenonomena and techniques, a central features of criticalband analysis relates to the characteristic of certain human auditoryresponses to be relatively constant over a range of frequencies. Thus,for example, the loudness of a band of noise at a constant soundpressure remains constant as the bandwidth increases up to the criticalband; then loudness begins to increase. In the cited Tobias reference,at page 162, there is presented one possible table of 24 critical bands,each having an identified upper and lower cutoff frequency. The totalityof the band covers the audio frequency spectrum up to 15.5 kHz. Theseeffects have been used to advantage in designing coders for audiosignals. See, for example, M. R. Schroeder et al, "Optimizing DigitalSpeech Coders By Exploiting Masking Properties of the Human Ear,"Journal of the Acoustical Society of America, Vol. 66, pp. 1647-1652,December, 1979.

E. F. Schroeder and H. J. Platte, "MSC': Stereo Audio Coding withCD-Quality and 256 IT/SEC," IEEE Trans. on Consumer Electronics, Vol.CE-33, No. 4, November 1987, describes a perceptual encoding procedurewith possible application to CDs.

In J. D. Johnston, "Transform Coding of Audio Signals Using PerceptualNoise Criteria," IEEE Trans. on Selected Areas in Communications,February 1988, pp. 314-434 and .[.copending application Ser. No.292,598, filed Dec. 30, 1988;.]. .Iadd.U.S. Pat. No. 5,535,300, issuedJul. 9, 1996 on application Ser. No. 284,324, filed Aug. 2, 1994, whichis a continuation of Ser. No. 109,867, Aug. 20, 1993, U.S. Pat. No.5,341,457, which is a continuation of Ser. No. 962,151, Oct. 16, 1992,abandoned, which is a continuation of Ser. No. 844,967, Feb. 28, 1992,abandoned, which is a continuation of Ser. No. 292,598, Dec. 30, 1998,abandoned, .Iaddend.by J. L. Hall II and J. D. Johnston, assigned to theassignee of the present invention, there are disclosed enhancedperceptual coding techniques for audio signals. Perceptual coding, asdescribed in the Johnston, et al paper relates to a technique forlowering required bitrates (or reapportioning available bits) inrepresenting audio signals. In this form of coding, the maskingthreshold for unwanted signals is identified as a function of frequencyof the desired signal. Then the coarseness of quantizing used torepresent a signal component of the desired signal is selected such thatthe quantizing noise introduced by the coding does not rise above thenoise threshold, though it may be quite near this threshold. Whiletraditional signal-to-noise ratios for such perceptually coded signalsmay be relatively low, the quality of these signals upon decoding, asperceived by a human listener, is nevertheless high. In particular, thesystems described in this paper and copending application use a humanauditory model to derive a short-term spectral masking function that isimplemented in a transform coder. Bitrates are reduced by extractingredundancy based on signal frequency analysis and the masking function.The techniques use a so-called "tonality" measure indicative of theshape of the spectrum over the critical bands of the signal to be codedto better control the effects of quantizing noise. As noted in theJohnston paper, supra, and the cited patent application Ser. No.292,598, the masking effect of noise is dependent on the "tonelike ornoiselike" nature of the signal. In particular, an offset for themasking threshold for each critical band is developed which depends onwhether a "coefficient of tonality" for the signal in each critical bandindicates that the signal is relatively more tonelike or noiselike. Thiscoefficient of tonality is, in turn, conveniently derived from a measureof flatness of the spectrum of the signal over that critical band.

SUMMARY OF THE INVENTION

The present invention improves on the tonality based perceptual codingtechniques described in the cited copending application Ser. No.292,598. Because the frequency analysis typically involves determiningspectral information at discrete frequencies ("frequency lines") withinthe audio spectrum, and because a number of these discrete frequencieswill, in general, fall within each critical band, the processingdescribed in the prior application Ser. No. 292,598 and the citedJohnston paper, illustratively grouped spectral values for frequencieswithin each critical band. That is, the spectral processing used todetermine the tonality and masking threshold was typically accomplishedon a critical-band-by-critical-band basis. The improvements made inaccordance with aspects of the present invention permit grouping ofvalues at discrete frequencies, but also include the use of afrequency-line-by-frequency-line analysis, rather than analysis on aspectrum-wide basis, in calculating the tonality metric values. Thisline-by-line calculation is advantageously based on a history ofconsecutive frames of the input power spectrum, rather than on thecurrent frame alone. The present invention then advantageouslydetermines improved estimates of perceptual thresholds on a line-by-linebasis, rather than on a critical-band-by-critical-band basis. Inappropriate cases, the critical band masking threshold can be used.

More particularly, the tonality estimate of the present inventionadvantageously uses a statistic of a plurality, typically two, of theprevious time frames to predict the value of a given power spectrumfrequency line in the current time frame. This process features the useof a Euclidian distance between the predicted line and the actual linein a present frame to estimate the tonality (or noisiness) of eachspectral line. It proves convenient in these calculations to perform anormalization of the estimates using the predicted and actual values.These tonality estimates can then be combined, e.g., on a critical-bandbasis, to obtain an estimate of the actual tonality. This is done foreach frequency to determine the noise-masking thresholds to be used inquantizing the frequency information to be finally coded for recording,transmission or other use.

A spreading operation known in the art, e.g., that is describedgenerally in the Schroeder, et al paper, supra, is employed in analternative implementation of certain aspects of the improved maskingthreshold determination process of the present invention. Spreadinggenerally relates to the masking effect on a signal at a given frequencyby signals separated in frequency from the given signal frequency. Inthe above cited prior application Ser. No. 292,598, and the Johnstonpaper, matrix processing is disclosed which involves signal spreadingeffects from signals many bark frequencies away. A bark is the term usedto indicated a frequency difference of one critical band.

Other features and improvements of the present invention will appearfrom the following detailed description of an illustrative embodiment.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 is a block diagram of an overall system based on the presentinvention;

FIG. 2 is a flow chart illustrating the masking threshold processingemployed in an illustrative embodiment of the coder in accordance withthe present invention; and

FIG. 3 shows a detailed block diagram of a decoder that may be used inthe system of FIG. 1.

DETAILED DESCRIPTION

To simplify the present disclosure, copending application, Ser. No.292,598, filed Dec. 30, 1988, by J. L. Hall II and J. D. Johnston,assigned to the assignee of the present invention; J. D. Johnston,"Transform Coding of Audio Signals Using Perceptual Noise Criteria,"IEEE Journal on Selected Areas in Communications, Vol. 6, No. 2,February, 1988; and International Patent Application (PCT) WO 88/01811,filed Mar. 10, 1988 by K. Brandenburg are hereby incorporated byreference as if set forth in their entirety herein.

Also incorporated by reference as set forth in its entirety herein is aproposal submitted by the assignee of the present application, interalia, to the International Standards Organization (ISO) on Oct. 18, 1989for consideration by the members of that body as the basis for astandard relating to digital coding. This document, entitled "ASPEC"will hereinafter be referred to as "the ISO Document".

Application Ser. No. 292,598 describes a perceptual noise thresholdestimation technique in the context of the well-known transform coder.See also, for example, N. S. Jayant and P. Noll, Digital Coding ofWaveforms--Principles and Applications to Speech and Video, especially,Chapter 12, "Transform Coding."

The application WO 88/01811 describes the so-called OCF coder that maybe used as one alternative to the transform coder described in theJayant, et al reference or the application Ser. No. 292,598.

FIG. 1 of the present application discloses the overall organization ofa system incorporating the present invention.

In that figure, an analog signal on input 100 is applied to preprocessor105 where it is sampled (typically at 32 kHz) and each sample isconverted to a digital sequence (typically 16 bits) in standard fashion.Preprocessor 105 then groups these digital values in frames (or blocksor sets) of, e.g., 512 digital values, corresponding to, e.g., 16 msecof audio input. Other typical values for these and other system orprocess parameters are discussed in the ISO Document.

It also proves advantageous to overlap contiguous frames, typically tothe extent of 50 percent. That is, though each frame contains 512ordered digital values, 256 of these values are repeated from thepreceding 512-value frame. Thus each input digital value appears in twosuccessive frames, first as part of the second half of the frame andthen as part of the first half of the frame.

These frames are then transformed in standard fashion using. e.g., themodified discrete cosine transform (MDCT) described in Princen, J., etal, "Sub-band Transform Coding Using Filter Bank Designs Based on TimeDomain Aliasing Cancellation," IEEE ICASSP, 1987, pp. 2161-2164. Thewell-known short-term Fast Fourier Transform (FFT) in one of its severalstandard forms can be adapted for such use as will be clear to thoseskilled in the art. The set of 257 complex coefficients (zero-frequency,Nyquist frequency, and all intermediate frequencies) resulting from theMDCT represents the short-term frequency spectrum of the input signal.

The complex coefficients are conveniently represented in polarcoordinate or amplitude and phase components, indicated as "r" and"phi," respectively, in the sequel.

While not shown explicitly in FIG. 1, the present inventionadvantageously utilizes known "pre-echo" and dynamic windowingtechniques described, for example, in the above-referenced ISO Document.Other pre-processing techniques that can be included in thefunctionality represented by preprocessor block 105 in FIG. 1 includethose described in the ISO Document.

Perceptual coder block 110 shown in FIG. 1 includes the perceptualmasking estimation improvements of the present invention and will bedescribed in detail below. Quantizer/Coder block 115 in FIG. 1represents the above-mentioned transform or OCF coder and related coderfunctionality described in the incorporated application Ser. No. 292,598and the ISO Document.

Block 120 in FIG. 1 represents the recording or transmission medium towhich the coded output of quantizer/coder 115 are applied. Suitableformatting and modulation of the output signals from quantizer/coder 115is included in the medium block 120. Such techniques are well known tothe art and will be dictated by the particular medium, transmission orrecording rates and other system parameters.

Further, if the medium 120 includes noise or other corruptinginfluences, it may be necessary to include additional error-controldevices or processes, as is well known in the art. Thus, for example, ifthe medium is an optical recording medium similar to the standard CDdevices, then redundancy coding of the type common in that medium can beused with the present invention.

If the medium is one used for transmission, e.g., a broadcast,telephone, or satellite medium, then other appropriate error controlmechanisms will advantageously be applied. Any modulation, redundancy orother coding to accommodate (or combat the effects of) the medium will,of course, be reversed upon the delivery from the channel or othermedium to the decoder. The originally coded information provided byquantizer/coder 115 will therefore be applied at a reproduction device.

More particularly, these coded signals will be applied to decoder 130shown in FIG. 1, and to perceptual decoder 140. As in the case of thesystem described in application Ser. No. 292,598, some of theinformation derived by perceptual coder 110 and delivered viaquantizer/coder 115 and medium 120 to the perceptual decoder 140 is inthe nature of "side information." Such side information is describedmore completely below and in the ISO Document. Other informationprovided by quantizer/coder 115 via medium 120 relating to the spectralcoefficients of the input information is illustratively provideddirectly to decoder 130.

After processing the side information, perceptual decoder 140 providesdecoder 130 with the additional information to allow it to recreate,with little or no perceptual distortion, the original spectral signalsdeveloped in pre-processor 105. These recreated signals are then appliedto post-processor 150, where the inverse MDCT or equivalent operationsand D/A functions are accomplished (generally as described inapplication Ser. No. 292,598) to recreate the original analog signal onoutput 160. The output on 160 is in such form as to be perceived by alistener as substantially identical to that supplied on input 100.

PERCEPTUAL THRESHOLD VALUES

With the overall system organization described above as background, andwith the details of the incorporated application Ser. No. 292,598 as abaseline or reference, the improved process of calculating the thresholdvalue estimates in accordance with the present invention will bedescribed. The ISO Document should also be referred to for more detaileddescriptions of elements of the present invention and for alternativeimplementations.

FIG. 2 is a flow chart representation of the processing accomplished inperceptual coder 110. Listing 1, attached, forms part of thisapplication. This listing is an illustrative annotated FORTRAN programlisting reflecting processing in accordance with aspects of the presentinvention relating to developing a noise masking threshold. A usefulreference for understanding the FORTRAN processing as described hereinis FX/FORTRAN Programmer's Handbook, Alliant Computer Systems Corp.,July 1988. Likewise, general purpose computers like those from AlliantComputer Systems Corp. can be used to execute the program of Listing 1.Table 1 is a list of constants used in connection with the illustrativeprogram of Listing 1.

While a particular programming language, well known to the art, is usedin Listing 1, those skilled in the art will recognize that otherlanguages will be appropriate to particular applications of the presentinvention. Similarly, constants, sampling rates and other particularvalues will be understood to be for illustrative purposes only, and inno sense should be interpreted as a limitation of the scope of thepresent invention.

FIG. 2 and Listing 1 will now be discussed in detail to give a fullerunderstanding of the illustrative embodiment of the present invention.

Function 200 in FIG. 2 indicates the start of the processing performedin determining the improved estimates of the masking thresholds inaccordance with the present invention. Block 210 represents theinitializing functions, using the absolute threshold values from Table1, represented by block 220 in FIG. 2.

These initializing or startup operations are depicted explicitly inListing 1 by the subroutine strt(). In this illustrative subroutine,threshold generation tables ithr and bval are set up first.

It should be noted that i is used, e.g., as the index for the criticalbands, of the type described in the application Ser. No. 292,598, andhas values from 0 to 25. The index i may be used with different rangesfor other processing in other occurrences appearing in Listing 1.

In strt(), abslow is a constant assigned the indicated value to set theabsolute threshold of hearing. rzotz is the desired sampling rate. rnormis a normalization variable used in connection with the spreadingfunction. openas is simply an operator used for opening an ascii file.db is a dummy variable used to calculate table entries.

The actual threshold calculation begins with the sub-routine thrgen. Itsvariables r and phi are, of course, the spectral coefficients providedby preprocessor 105 in FIG. 1. They are vectors having 257 values (zerofrequency, the Nyquist frequency and all intermediate components).

Block 210 represents the initialization, using the absolute thresholdinformation in Table 1 (shown in block 220 in FIG. 2).

The next step in calculation of the perceptual threshold is thecalculation of the tonality t(j) of the signal energy within eachcritical band j. This operation is indicated by block 230 in FIG. 2. Thetonality metric is determined in accordance with the program of Listing1 by forming

    dr(ω)=r.sub.t-1 (ω)-r.sub.t-2 (ω)

and

    dφ(ω)=φ.sub.t-1 (ω)-φ.sub.t-2 (ω).

dr and dφ are the differences between the radius (r(ω)) and phase (φ(ω))of the previous calculation block and the one two previous. Thecalculation is done on a frequency line by frequency line (ω) basis.Note that if the blocks are shortened by the dynamic windowing techniquereferred to in the ISO Document, the frequency lines are duplicatedaccordingly, so that the number of frequency lines remains the same.Additionally, the difference is multiplied accordingly in such a dynamicwindowing context, so that it represents the (estimated) difference overone differently sized block.

From the dr and dφ values and the previous r and φ, the "expected"radius and phase for the current block are calculated:

    r(ω)=r.sub.t-1 (ω)+dr(ω)

and

    φ(ω)=φ.sub.t-1 (ω)+dφ(ω),

where the ω and difference signals are again adjusted appropriately forthe dynamic windowing, if present.

From these values and the actual values for the current spectrum, arandomness metric (c(ω))) is calculated: ##EQU1##

c values are used later to calculate the appropriate threshold in eachcritical band, through the calculation of t(j).

Next, the critical band energy calculation is made, as indicated byblock 240 in FIG. 2.

The energy in each critical band is ##EQU2## and the summed randomnessmetric, C(j) is ##EQU3##

The C(j) are then converted to the tonality index, t(j) in two steps

    tmp(j)=max(0.05, min(0.5. C(j))),

then

    t(j)=0.43*1n tmp(j)-0.299

It is now possible to derive the unspread threshold values.

From the power and the tonality values, the unspread threshold uthr(j)is calculated. First, the proper value for the masking SNR (snr_(db)(j)), corresponding to frequency and tonality, is calculated indecibels:

    snr.sub.db (j)=max(max(24.5, 15.5+j)+5.5*(1.-t(j)), fmin(j))

where fmin is tabulated in the ISO Document and in Table 2 as an energyratio, rather than in db. Table 2 also indicates critical bandboundaries, expressed in terms of frequency lines for the indicatedsampling rate. Then the ratio of masked noise energy to signal energy iscalculated: ##EQU4## and the unspread threshold value is calculated:

    uthr(j)=P(j)*snr(j).

The spread threshold (sthr) is calculated from the unspread threshold,the snr(j), and the critical band energies, (P(j), according to

    sthr(j)=max(uthr(j), snr(j)*P(i)*mask(i-j)[i>j])

where mask(i-j) is tabulated at the end of the ISO Document, andrepresents an example modified spreading function. Alternatively, thespreading may be accomplished using the function sprdgf(j, i) given inListing 1.

After spreading, the spread threshold is compared to the absolutethreshold, and the maximum substituted in the limited threshold,lthr(j). As noted in the Johnston paper cited above, this adjustment ismade because it is not practical to specify a noise threshold that islower than the level at which a person could hear noise. Any suchthreshold below the absolute level at which it could be heard couldresult in waste of resources. Thus the absolute threshold is taken intoaccount by lthr(j)=max(thr(j), absthr(j)), where absthr(j) is tabulatedat the end of the ISO document. Note that the absolute threshold isadjusted for actual block length.

Finally, the threshold is examined, after adjustment for block lengthfactors, for narrow-band pre-echo problems. The final threshold, thr(j)is then calculated:

    thr(j)=min(lthr(j), 2*othr(j))

and othr is then updated:

    othr(j)=lthr(j).

The threshold lthr(j) is transferred to a variable named 1xmin(j) foruse in the outer iteration loop described in the ISO Document.

A final step in the threshold calculation procedure calculates anentropy measure that is used to estimate the number of bits needed forthe current signal block. This estimate is derived for use by thequantizer/coder 115 using ##EQU5##

This completes the perceptual threshold processes.

An output of the processing described above and in Listing 1 is a set ofthreshold values that the quantizer/coder 115 FIG. 1 employs toefficiently encode the input signal information for transmission orstorage as described above.

While the preceding description of an illustrative embodiment of thepresent invention has referred to a particular programming language andtype of processors, it will be recognized by those skilled in the artthat other implementations will be desirable in particular cases. Forexample, in consumer products size requirements may dictate that highperformance general purpose or special purpose microprocessors likethose from AT&T, Intel Corp. or Motorola be used. For example, variousof the AT&T DSP-32 digital signal processing chips have proved usefulfor performing processing of the type described above. In otherparticular cases, special purpose designs based on well-known chipdesign techniques will be preferably employed to perform the abovedescribed processing.

The tonality metric determined in the illustrative embodiment aboveusing differences between the values of r(ω) and φ(ω) from the presentblock and the corresponding values from the two previous blocks. Inappropriate cases, it may prove advantageous to form such a differenceusing only one prior value in evaluating these variables, or using aplurality greater than two of such prior values, as the basis forforming the expected current values.

Likewise, though values for certain of the variables described above arecalculated for each spectral frequency line, it may prove to be aneconomical use of processing resources to calculate such values for lessthan all of such lines.

Aspects of the processing accomplished by quantizer/coder 115 anddecoder 130 in FIG. 1 will now be described, based on materials includedin the ISO Document.

The inputs to quantizer/coder 115 in FIG. 1 include spectral informationderived by MDCT and other processing in accordance with functionsrepresented by block 105 in FIG. 1, and outputs of perceptual coder 110,including the noise threshold information and perceptual energyinformation. Quantizer/coder 115 then processes this information and indoing so provides a bitstream to the channel or recording medium 120 inFIG. 1, which bitstream includes information divided into three mainparts:

a first part containing the standardized side information, typically ina fixed length record;

a second part containing the scaling factors for the 23 critical bandsand additional side information used for so-called adaptive windowswitching, when used; the length of this part can vary depending oninformation in the first part; and

a third part containing the entropy coded spectral values, typically inthe form of the well-known two-dimensional Huffman code.

Typical apportionment for information provided by quantizer/coder 115 issummarized in Table 3.

    __________________________________________________________________________    PART I                                                                        __________________________________________________________________________    sync work (0110111)                                                                              signals the start of the block                                                                    7 bit                                  position of parts 2 & 3 (bitsav)                                                                 difference between the last bit of part 2 &                                                       12 bit                                                    and the first bit of part 1                                word length selector for part 2 (cbtable)                                                        selects by a table a word length for                                                              4 bit                                                     scaling factors for the 12 lower critical                                     bands between 0..4 and for the higher                                         critical bands between 0..3. Four                                             combinations with a small expectation are                                     unused                                                     number of big spectral values (bigvalues)                                                        number of pairs of spectral values that                                                           8 bit                                                     coded with a two dimensional Huffman                                          code, able to code values larger than 1 × 1 the                         so called small spectral values                            quantizer and global gain information (Gain)                                                     level differences between original and                                                            7 bit                                                     quantized values in steps of 2.sup.1                       Huffman codetable (iqfeld)                                                                       values 0..3 select a 4 × 4, 8 × 8, 16                             × 16 or       4 bit                                                     32 × 32 codetable                                                       values > 3 select a 32 × 32 ESC-table when 31                           is an ESC-character followed by (Huffman                                      codetable-3) bits of linear transmitted part of                               the spectral value, that has to be added to the                               31                                                         pre-emphasis (preflag)                                                                           flag, that the higher part of the spectrum                                                        1 bit                                                     quantized with a smaller quantizer step size               critical band scaling stepsize (ps-scale)                                                        flag, whether the critical band scaling                                                           1 bitr                                                    has a stepsize of 2 or 2.sup.1                             block split (split-flag)                                                                         flag, whether the block is split into                                                             1 bit                                                     subblocks (dynamic windowing)                              0/1 codetable (count 1 table)                                                                    selection of one of two possible codebooks                                                        1 bit                                                     for the coding of small values (-1,0,1)                    DC-part of the signal (dc-value)       9 bit                                                                         55 bit                                 __________________________________________________________________________

PART II

The following bits are dependent on the side information of part 1 (e.g.subblock information is only needed if coding in subblocks is actuallyselected)

    ______________________________________                                        global gain for subblock 2                                                                            3 bit                                                 DC-value of subblock 2  9 bit                                                 global gain for subblock 3                                                                            3 bit                                                 DC-value of subblock 3  9 bit                                                 global gain for subblock 4                                                                            3 bit                                                 DC-value of subblock 4  9 bit                                                 scaling factors for the lower 12 *(0 . . . 4)                                                         48 bit                                                12 critical bands                                                             scaling factors for the higher *(0 . . . 3)                                                           33 bit                                                11 critical bands                                                                                    117 bit                                                ______________________________________                                    

PART III

Huffman coded spectral values about 0 . . . 4000 bit

A part of the Huffman code is ordered in a two-dimenional array with thenumber of columns depending on the longest codeword of the Huffmancodetable (5, 16, 18, 22 or 19 bits for ESC-tables). The number of rowsis the size of part 3 divided by the number of columns. The codewords ofthe higher frequencies that can not be ordered into this rectangulararray are fit into the remaining gaps.

Signs of values not equal to 0 follow the codeword directly.

When using the ESC-table, up to 4 msb+sign of the linear transmittedpart follow the codeword directly the lsb+sign are filled in the gaps.********+xxxxxxxxxxx**********+mmmmxxx***-xxxxxxxxxxxxxxxxxx** . . .

1. start of row 2. start of row 3. start of row 4. . . .

* bits of Huffman codeword ordered in the array

+ sign of the first spectral values

- sign of the second spectral values

m msb's of the linear part of an ESC-value

x gaps, filled by the rest of the Huffman code and the lsb's

The advantage of the array, which is sent in row by row order as thebitstream, is the restriction of error propagation to higherfrequencies.

FIG. 3 shows a detailed block diagram of a decoder in accordance withaspects of the present invention. FIG. 3 shows a synchronization buffer310 which acts to appropriately buffer input bitstreams arriving oninput lead 305. Error correction is then effected in the part of thesystem represented by block 315. This block also provides for extractionof low frequency spectral coefficients.

Side information extracted in block 320 is demultiplexed from the otherarriving information and is sentto either the Huffman coder 330 or thespeech reconstruction functional elements 335. The actual coded spectralcoefficient information is sent to the Huffman decoder itself. Thedecoder 330 is provided with a stored Huffman codebook equivalent tothat maintained at the coder of FIG. 1. After the spectrum informationis reconstructed, the MDCT synthesis (or other frequency synthesisoperation) is applied to reverse the original frequency analysisperformed preparatory to coding. Standard aliasing techniques are thenapplied to provide samples to be converted by digital-to-analogconversion and reproduction to acoustic or other analog signals.

    ______________________________________                                        LISTING 1                                                                     ______________________________________                                        c      First startup routine                                                         subroutine strt( )                                                     c      sets up threshold generation tables, ithr and bval                            real freq(0:25)/0.,100.,200.,300.,400.,500.,630.,770.,                        1 920.,1080.,1270.,1480.,1720.,2000.,2320.,2700.,                             1 315O.,3700.,4400.,5300.,6400.,7700.,9500.,12000.,15500.,                    1 25000./                                                                     common/thresh/ithr(26),bval(257,rnorm(257)                                    common/absthr/abslow(257)                                                     common/sigs/ifirst                                                     c      ithr(i) is bottom of crital band i. bval is bark index                 c      of each line                                                                  write(*,*) `what spl will + - 32000 be →`                              read(*.*) abslev                                                              abslev = abslev - 96.                                                         abstow = 5224245.*5224245./exp(9.6*alog(10.))                                 ifirst = 0                                                                    write(*.*) `what is the sampling rate`                                        read(*.*) rzotz                                                               fnyq = rzotz/2.                                                        c      nyquest frequency of interest.                                                ithr(1) = 2.                                                                  i = 2                                                                  10     ithr(i) = freq(i - 1)/fnyq*256. + 2.                                          i = i + 1                                                                     if (freq(i - 1) .It. fnyq) goto 10                                     c      sets ithr to bottom of cb                                                     ithr(i:26) = 257                                                       c      now, set up the critical band indexing array                                  bval(1) = 0                                                            c      first, figure out frequency, then . . .                                       do i = 2,257,1                                                                fre = (i - 1)/256.*fnyq                                                c      write(*,*) i,fre                                                       c      fre is now the frequency of the line. convert                          c      it to critical band number . . .                                              do j = 0,25,1                                                                 if (fre .gt. freq(j)) k = j                                                   end do                                                                 c      so now, k = last CB lower than fre                                            rpart = fre - freq(k)                                                         range = freq(k + 1) - freq(k)                                                 bval(i) = k + rpart/range                                                     end do                                                                        rnorm = 1                                                                     do i = 2,257,1                                                                imp = 0                                                                       do j = 2.257,1                                                                imp = imp + sprdngf(bval(j),bval(i))                                          end do                                                                        rnorm(i) = tmp                                                                end do                                                                        rnorm = 1./rnorm                                                       c      do i = 1.257.1                                                         c      write(*.*)i.bval(i), 10.*alog10(rnorm(i))                              c      end do                                                                        call openas(0.`/usr/jj/nsrc/thrtry/freqlist`.0)                               do i = 2.257,1                                                                read(0,*) ii.db                                                               if (ii .ne. i) then                                                           write(*,*) `freqlist is bad.`                                                 stop                                                                          end if                                                                        db = exp((db-abslev)/10.*alog(10.))                                    c      write(*,*) i,db                                                               abslow(i) = abslow(i)*db                                                      end do                                                                        abslow(1) = 1.                                                                write(*,*) `lowest level is `, sqrt(abslow(45))                               return                                                                        end                                                                    c      Threshold calculation program                                                 subroutine thrgen(rt,phi,thr)                                                 real r(257),phi(257)                                                          real rt(257)                                                                  real thr(257)                                                                 common/blnk/ or(257),ophi(257),dr(257),dphi(257)                              common/blk1/othr(257)                                                         real alpha(257),tr(257),tphi(257)                                             real beta(257),bcalc(257)                                                     common/absthr/abslow(257)                                                     common/thresh/ithr(26),bval(257),rnorm(257)                                   common/sigs/ifirst                                                            r = max(rt,.0005)                                                             bcalc = 1.                                                                    if(ifirst .eq. 0) then                                                        or = 0).                                                                      othr = le20                                                                   ophi = 0                                                                      dr = 0                                                                        dphi = 0                                                                      ifirst = 1                                                                    end if                                                                 c      this subroutine figures out the new threshold values                   c      using line-by-line measurement.                                               tr. = or -dr                                                                  tphi = ophi ÷ dphi                                                        dr = r -or                                                                    dphi = phi - ophi                                                             or = r                                                                        ophi = phi                                                                    alpha = sqrt((r*cos(phi) - tr*cos(tphi))                                      1 *(r*cos(phi) - tr*cos(tphi))                                                2 + (r*sin(phi) - tr*sin(tphi))                                               3 *(r*sin(phi) - tr*sin(tphi)))                                               4/(r + abs(tr) + 1.)                                                          beta  alpha                                                            c      now, beta is the unweighted tonality factor                                   alpha = r*r                                                            c      now, the energy is in each                                             c      line. Must spread. (ecch)                                              c      write(*,*) `before spreading`                                                 thr = 0                                                                       bcalc = 0                                                              cvdS1  cncall                                                                        do i = 2.2257,1                                                        cvdS1  cncall                                                                        do j = 2.257,1                                                                glorch = sprdngf(bval(j),bval(i))                                             thr(i) = alpha(j)*glorch + thr(i)                                             bcalc(i) = alpha(j)*glorch*beta(j) + bcalc(i)                          c      thr is the spread energy. bcalc is the weighted chaos                         end do                                                                 c      if (thr(i) .eq. 0) then                                                c      write(*,*) `zero threshold, you blew it`                               c      stop                                                                   c      end if                                                                        bcalc(i) = bcalc(i)/thr(i)                                                    if (bcalc(i) .gt. .5)bcalc(i) = 1. - bcalc(i)                          c      that normalizes bcalc to 0-.5                                                 end do                                                                 c      write(*.*) `after spreading`                                                  bcalc = max(bcalc,.05)                                                        bcalc = min(bcalc,.5)                                                  c      bcalc is now the chaos metric, convert to the                          c      tonality metric                                                               bcalc = -.45*alog(bcalc) ` .299                                        c      now calculate DB                                                              bcalc = max(24 5.(15.5 + bval))*bcalc + 5.5*(1. - bcalc)                      bcalc = exp((-bcalc/10.) * alog (10.))                                 c      now, bcalc it actual tonality factor, for power                        c      space.                                                                        thr = thr*rnorm*bcalc                                                  c      threshold is tonality factor times energy (with                               normalization)                                                                thr = max(thr,abslow)                                                         alpha = thr                                                                   thr = min(thr,othr*2.)                                                        othr = alpha                                                           c      write(*,*) `leaving thrgen`                                                   return                                                                        end                                                                    c      And, the spreading function                                                   function sprdngf(j,i)                                                         real i,j                                                                      real sprdngf                                                           c      this calculates the value of the spreading function for                c      the i'th bark, with the center being the j'th                          c      bark                                                                          temp1 = i - j                                                                 temp2 = 15.8811389 + 7.5*(templ + .474)                                       temp2 = temp2 - 17.5*sqrt(1. + (templ + .474)*                                (temp1 + .474))                                                               if( temp2 .1e. - 100.) then                                                   temp3 = 0.                                                                    else                                                                          temp2 = temp2/10.*alog(10.)                                                   temp3 = exp(temp2)                                                            end if                                                                        sprdngf = temp3                                                               return                                                                        end                                                                    ______________________________________                                    

                  TABLE I                                                         ______________________________________                                        Absolute Threshold File -                                                     ("freqlist" for start-up routine)                                             ______________________________________                                         1          56      3.  111  16.  166  16.  221   50.                          2   27.    57      4.  112  17.  167  16.  222   50.                          3   18.    58      4.  113  17.  168  16.  223   50.                          4   16.    59      5.  114  17.  169  16.  224   50.                          5   10.    60      5.  115  17.  170  16.  225   50.                          6   9.     61      5.  116  18.  171  17.  226   50.                          7   8.     62      6.  117  18.  172  17.  227   50.                          8   8.     63      6.  118  18.  173  17.  228   50.                          9   8.     64      6.  119  18.  174  17.  229   50.                         10   8.     65      6.  120  18.  175  17.  230   50.                         11   8.     66      7.  121  18.  176  17.  231   50.                         12   7.     67      7.  122  18.  177  18.  232   50.                         13   7.     68      7.  123  18.  178  18.  233   50.                         14   7.     69      8.  124  17.  179  18.  234   60.                         15   7.     70      9.  125  17.  180  18.  235   60.                         16   7.     71     10.  126  16.  181  18.  236   60.                         17   7.     72     10.  127  16.  182  19.  237   60.                         18   7.     73     10.  128  16.  183  19.  238   60.                         19   7.     74     10.  129  16.  184  19.  239   60.                         20   7.     75     10.  130  15.  185  19.  240   60.                         21   7.     76     10.  131  15.  186  19.  241   60.                         22   7.     77     10.  132  15.  187  20.  242   60.                         23   7.     78     10.  133  15.  188  21.  243   60.                         24   7.     79     10.  134  14.  189  22.  244   60.                         25   6.     80     10.  135  14.  190  23.  245   60.                         26   5.     81     11.  136  13.  191  24.  246   60.                         27   5.     82     11.  137  12.  192  25.  247   60.                         28   5.     83     11.  138  12.  193  26.  248   60.                         29   5.     84     11.  139  12.  194  27.  249   60.                         30   5.     85     11.  140  12.  195  28.  250   60.                         31   4.     86     12.  141  12.  196  29.  251   60.                         32   4.     87     12.  142  12.  197  30.  252   60.                         33   4.     88     12.  143  12.  198  31.  253   60.                         34   4.     89     12.  144  13.  199  32.  254   60.                         35   4.     90     12.  145  13.  200  33.  255   60.                         36   3.     91     12.  146  14.  201  34.  256   60.                         37   3.     92     13.  147  14.  202  35.  257   60.                         38   3.     93     13.  148  14.  203  36.                                    39   3.     94     13.  149  14.  204  37.                                    40   2.     95     13.  150  14.  205  38.                                    41   2.     96     13.  151  14.  206  39.                                    42   1.     97     13.  152  14.  207  40.                                    43   1.     98     14.  153  14.  208  41.                                    44   1.     99     14.  154  14.  209  42.                                    45   1.     100    14.  155  14.  210  43.                                    46   0.     101    14.  156  15.  211  44.                                    47   0.     102    15.  157  15.  212  45.                                    48   0.     103    15.  158  15.  213  46.                                    49   0.     104    15.  159  15.  214  47.                                    50   0.     105    15.  160  15.  215  48.                                    51   0.     106    15.  161  15.  216  49.                                    52   2.     107    16.  162  15.  217  50.                                    53   2.     108    16.  163  15.  218  50.                                    54   2.     109    16.  164  15.  219  50.                                    55   3.     110    16.  165  15.  220  50.                                    ______________________________________                                    

                  TABLE 2                                                         ______________________________________                                        table of critical bands and fmin                                              (used at 48 kHz sampling frequency)                                           The upper band edge is set to 20 kHz (line 214 at block                       length 256, line 428 at block length 512)                                     The following table is used at block length 512. The table                    for block length 256 can easily be calculated                                 from the table for 512 block length. The tables for other                     sampling rates can also be calculated from this                               list.                                                                         cb     start          width  fmin                                             ______________________________________                                         1      0             4      .007                                              2      4             4      .007                                              3      8             4      .007                                              4     12             4      .007                                              5     16             4      .007                                              6     20             4      .007                                              7     24             4      .007                                              8     28             4      .01                                               9     32             4      .01                                              10     36             4      .01                                              11     40             6      .01                                              12     46             6       .0144                                           13     52             8       .0225                                           14     60             8      .04                                              15     68             12      .0625                                           16     80             12     .09                                              17     92             16     .09                                              18     108            20     .09                                              19     128            26      .1225                                           20     154            30      .1225                                           21     184            38     .16                                              22     222            50      .2025                                           23     272            70     .25                                              24     342            86                                                      ______________________________________                                    

    ______________________________________                                        17 17                                                                              16 16   16 16   16 16 16 16 16 16 16 16 16 16                            16 16                                                                              16 16   16 16   16 16 17 17 17 17 17 17 17 14                            17 17                                                                              17 17   16 16   16 16 16 16 16 16 16 16 16 16                            16 16                                                                              16 16   16 16   16 16 17 16 17 17 17 17 17 14                            18 17                                                                              17 17   17 16   16 16 16 16 16 16 16 16 16 16                            16 16                                                                              16 16   16 16   16 16 17 17 17 17 17 17 17 14                            18 17                                                                              17 17   17 17   17 16 17 16 16 16 16 16 16 16                            16 16                                                                              16 16   16 16   16 16 16 17 17 17 17 17 17 13                            18 18                                                                              17 17   17 17   17 17 17 17 16 17 17 16 17 16                            17 17                                                                              17 16   16 16   17 17 17 17 17 17 17 17 17 14                            18 18                                                                              17 17   17 17   17 17 17 17 17 17 17 17 17 17                            17 17                                                                              17 17   17 17   17 17 17 17 17 17 17 17 18 14                            18 18                                                                              17 17   17 17   17 17 17 17 17 17 17 17 17 17                            17 17                                                                              17 17   17 17   17 17 17 17 17 17 17 17 17 14                            18 18                                                                              18 18   18 17   17 17 17 17 17 17 17 17 17 17                            17 17                                                                              17 17   17 17   17 17 17 17 17 17 17 17 17 14                            19 18                                                                              18 18   18 18   18 17 17 18 17 17 17 17 17 17                            17 17                                                                              17 17   17 17   17 17 17 17 17 17 17 17 18 13                            19 19                                                                              18 18   18 18   18 18 18 17 17 18 17 17 17 17                            17 17                                                                              17 17   17 17   17 17 17 17 17 17 17 17 18 14                            15 15                                                                              15 14   14 14   14 14 14 14 14 14 14 14 14 14                            14 14                                                                              14 14   14 14   14 14 14 14 14 13 14 14 14 8                             ______________________________________                                    

We claim:
 1. A method of processing an ordered time sequence of audiosignals partitioned into contiguous blocks of samples, each such blockhaving a discrete short-time spectrum, S(ω_(i)), i=1,2, . . . , N, foreach of said blocks, comprisingpredicting, for each block .Iadd.of audiosignals.Iaddend., an estimate of the values for each S(ω_(i)) based onthe values for S(ω_(i)) for one or more prior blocks, determining foreach frequency, ω_(i), a randomness metric based on the predicted valuefor each S(ω_(i)) and the actual value for S(ω_(i)) for each block,based on said randomness metrics, and the distribution of power withfrequency in the block, determining the value of a tonality function asa function of frequency, and based on said tonality function, estimatingthe noise masking threshold at each ω_(i) for the block.
 2. The methodof claim 1 further comprising quantizing said S(ω_(i)) based on saidnoise masking threshold at each respective ω_(i).
 3. The method of claim1 wherein said step of predicting comprises,for each ω_(i), forming thedifference between the value of S(ω_(i)) for the corresponding ω_(i)from the two preceding blocks, and adding said difference to the valuefor S(ω_(i)) from the immediately preceding block.
 4. The method ofclaim 3, wherein said S(ω_(i)) is represented in terms of .[.its.].magnitude and phase, and wherein said difference and adding are effectedseparately for the magnitude and phase of S(ω_(i)).
 5. The method ofclaim 1, wherein said determining of said randomness metric isaccomplished by calculating the euclidian distance between said estimateof S(ω_(i)) and said actual value for S(ω_(i)).
 6. The method of claim5, wherein said determining of said randomness metric further comprisesnormalizing said euclidian distance with respect to the sum of themagnitude of said actual magnitude for S(ω_(i)) and the absolute valueof said estimate of S(ω_(i)).
 7. The method of claim 1, wherein saidestimating of the noise masking threshold at each ω_(i)comprisescalculating an unspread threshold function, and modifying saidunspread threshold function in accordance with a spreading function togenerate a spread threshold function.
 8. The method of claim 7, whereinsaid estimating of the noise masking threshold function furthercomprises modifying said spread threshold function in response to anabsolute noise masking threshold for each ω_(i) to form a limited spreadthreshold function.
 9. The method of claim 8, further comprisingmodifying said limited threshold function to eliminate any existingpre-echoes, thereby generating an output threshold function value foreach ω_(i).
 10. The method of any of claims 1, 7, 8 or 9, furthercomprising the steps ofgenerating an estimate of the number of bitsnecessary to encode S(ω_(i)) quantizing said S(ω_(i)) to form quantizedrepresentations of said S(ω_(i)) using said estimate of the number ofbits, and providing to a medium a coded representation of said quantizedvalues and information about how said quantized values were derived. 11.A method for processing an ordered sequence of coded signalscomprisingfirst code signals representing values of the frequencycomponents of a block of values of an audio signal and second codesignals representing information about how said first .Iadd.code.Iaddend.signals were derived to reproduce said audio signal withreduced perceptual error, said method comprisingusing said second.Iadd.code .Iaddend.signals to determine quantizing levels for saidaudio signal which reflect a reduced level of perceptual distortion,reconstructing quantized values for said frequency .[.content.]..Iadd.components .Iaddend.of said audio signal in accordance with saidquantizing levels, and transforming said reconstructed quantized.[.spectrum.]. .Iadd.values .Iaddend.to recover an estimate of the audiosignal.
 12. The method of claim 11 wherein said reconstructing comprisesusing said second .Iadd.code .Iaddend.signals to effect scaling of saidquantized values.
 13. The method of claim 11 wherein said reconstructingcomprises applying a global gain factor based on said second .Iadd.code.Iaddend.signals.
 14. The method of claim 11 wherein said reconstructingcomprises determining quantizer step size as a function of frequencycomponent.
 15. The method of claim 11 wherein said second .Iadd.code.Iaddend.signals include information about the degree of coarseness ofquantization as a function of frequency component.
 16. The method ofclaim 11 wherein said second .Iadd.code .Iaddend.signals includeinformation about the number of values of said audio signal that occurin each block. .Iadd.
 17. A method of processing an ordered timesequence of audio signals partitioned into a set of ordered blocks, eachsaid block having a discrete frequency spectrum comprising a first setof frequency coefficients, the method comprising, for each said block,the steps of:(a) grouping said first set of frequency coefficients intoa plurality of frequency groups, each of said frequency groupscomprising at least one frequency coefficient; (b) determining forfrequency coefficients in each of said frequency groups a randomnessmetric, said randomness metrics reflecting the predictability of saidfrequency coefficients; (c) based on said randomness metrics,determining the value of a tonality function signal as a function offrequency; and (d) based on said tonality function signal, estimating anoise masking threshold for frequency coefficients in each frequencygroup..Iaddend..Iadd.18. The method of claim 17 further comprising atleast one quantizing frequency coefficient in said first set offrequency coefficients based on said noise masking threshold for eachfrequency coefficient being quantized..Iaddend..Iadd.19. The method ofclaim 18 wherein said step of quantizing comprises assigning quantizinglevels for each of said frequency coefficients in each of said frequencygroups such that noise contributed by said quantizing falls below saidnoise masking threshold for the respective frequencygroup..Iaddend..Iadd.20. A method of processing an ordered time sequenceof audio signals partitioned into a set of ordered blocks, each saidblock having a discrete frequency spectrum comprising a first set offrequency coefficients, the method comprising, for each said block, thesteps of (a) grouping said first set of frequency coefficients into aplurality of frequency groups, each of said frequency groups comprisingat least one frequency coefficient; and (b) generating a set of tonalityindex signals, said set of tonality index signals comprising a tonalityindex signal for each of said frequency groups, said set of tonalityindex signals being based on at least one of said first set of frequencycoefficients corresponding to at least one previousblock..Iaddend..Iadd.21. The method of claim 20 further comprisinggenerating, based on the set of tonality index signals, a set ofrespective noise masking thresholds..Iaddend..Iadd.22. The method ofclaim 21 further comprising quantizing at least one frequencycoefficient in said first set of frequency coefficients based on saidnoise masking threshold for the band comprising the frequencycoefficient being quantized..Iaddend..Iadd.23. The method of claim 22wherein said step of quantizing comprises assigning quantizing levelsfor each of said frequency coefficients in each of said frequency groupssuch that noise contributed by said quantizing falls below said noisemasking threshold for each respective frequencycoefficient..Iaddend..Iadd.24. A storage medium adapted for use with adecoder, the storage medium manufactured in accordance with a processcomprising the steps of(a) processing an ordered time sequence of audiosignals partitioned into a set of ordered blocks, each said block havinga discrete frequency spectrum comprising a first set of frequencycoefficients; and (b) for each block:(1) grouping said first set offrequency coefficients into a plurality of frequency groups, each ofsaid frequency groups comprising at least one frequency coefficient; (2)determining for each of said frequency coefficients in said frequencygroups a randomness metric, said randomness metrics reflecting thepredictability of said frequency coefficients; (3) based on saidrandomness metrics, determining the value of a tonality function as afunction of frequency; (4) based on said tonality function, estimating anoise masking threshold for each frequency group; (5) quantizing each ofsaid frequency coefficients such that noise contributed by saidquantizing falls below said noise masking threshold for the frequencygroup comprising the frequency coefficient being quantized; (6) applyinga recording signal to said storage medium, thereby causing said storagemedium to store said recording signal, said recording signal comprisingsignals representing(i) said quantized frequency coefficients; and (ii)side information for controlling said decoder in reconstructing saidaudio signal from said recording signal upon retrieval of said recordingsignal from said storage medium, said side information comprisingquantizing information relating to said quantizing of frequencycoefficients..Iaddend..Iadd.25. The method of claim 24 wherein saidstorage medium is a compact disc..Iaddend..Iadd.26. The method of claim24 wherein said storage medium is a magnetic storagemeans..Iaddend..Iadd.27. A method of transmitting audio signals, themethod comprising:(a) processing an ordered time sequence of audiosignals partitioned into a set of ordered blocks, each said block havinga discrete frequency spectrum comprising a first set of frequencycoefficients; and (b) for each block:(1) grouping said first set offrequency coefficients into a plurality of frequency groups, each ofsaid frequency groups comprising at least one frequency coefficient; (2)determining for each of said frequency coefficients in said frequencygroups a randomness metric, said randomness metrics reflecting thepredictability of said frequency coefficients; (3) based on saidrandomness metrics, determining the value of a tonality function as aunction of frequency; (4) based on said tonality function, estimating anoise masking threshold for each frequency group; (5) quantizing each ofsaid frequency coefficients such that noise contributed by saidquantizing falls below said noise masking threshold for the frequencygroup comprising the frequency coefficient being quantized; (6) applyinga transmission signal to a transmission medium, said transmission signalcomprising signals representing said quantized frequencycoefficients..Iaddend..Iadd.28. The method of claim 27 wherein saidtransmission medium is a broadcast transmissionmedium..Iaddend..Iadd.29. The method of claim 27 wherein saidtransmission medium is an electrical conductingmedium..Iaddend..Iadd.30. The method of claim 27 wherein saidtransmission medium is an optical transmission medium..Iaddend..Iadd.31.The method of any of claims 17, 20, or 27 wherein said processingfurther comprises generating discrete frequency spectrumsignals..Iaddend..Iadd.32. The method of claim 31 wherein saidgenerating of discrete frequency spectrum signals comprises generatingdiscrete Fourier coefficient signals..Iaddend.